Collection of Internet

home *** CD-ROM | disk | FTP | other *** search

/ Collection of Internet / Collection of Internet.iso / infosrvr / doc / www_talk.arc / 000034_timbl _Tue Feb 4 08:44:25 1992.msg < prev next >

Wrap

Internet Message Format | 1992-11-30 | 4KB

Return-Path: <timbl> Received: by nxoc01.cern.ch (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0) id AA24645; Tue, 4 Feb 92 08:44:25 GMT+0100 Date: Tue, 4 Feb 92 08:44:25 GMT+0100 From: timbl (Tim Berners-Lee) Message-Id: <9202040744.AA24645@ nxoc01.cern.ch > Received: by NeXT Mailer (1.62) To: emv@cic.net Subject: Re: using WWW to follow gopher links Cc: www-talk@nxoc01.cern.ch, gopher@boombox.micro.umn.edu, wais-talk@quake.think.com Ed, All good stuff -- the world is coming together. What do you think is the most useful www option for tracing what's out there? I have two suggestions - one is a -list option (or something) which makes www return only list of related documents, one on each line. Another is one which will recursively run down a tree. The trouble with the latter is telling it where to stop. Depth isn't really good enough as probably you also want to constrain it to only gopher files, for example. Perhaps the most flexible would be just the first option, with a perl etc script around ir to be flexible. I'd link to see for example lists of all telnet sites references by gopher or www links, a wais server for www documents and gopher nodes. My guess is that one index could handle the lot so long as one trimmed off the few places where people have gatewayed in the entire ftp world, etc. Then I'd like to see a www server for that index so that one could jump straight to the docoument wherever it came from.... I have to write an articel today, maybe tomorrow I'll put in www -list. KUTGW Tim [PS: I assume you meant -p rather than -np in the www command. Perhaps we should put in -np if it is more intuitive than -p for no paging. I'll look at the CR problem.] __________________original message follows Tim, Some more results of wais/www/gopher collaboration. I have a new WAIS server running at wais.cic.net, called "midwest-weather". It's fed by loading in a bunch of weather reports from a gopher at Minnesota every hour. That system gets them from the "weather underground" at Michigan using some hairy expect scripts, I figured it'd be easier to get things out of gopher instead. The script looks like: WEATHER=gopher://mermaid.micro.umn.edu:150/00/Weather www -n -np ${WEATHER}/Indiana/Fort%20Wayne | sed -e 's/.$//' > fort-wayne.in www -n -np ${WEATHER}/Indiana/Indianapolis | sed -e 's/.$//' > indianapolis.in www -n -np ${WEATHER}/Indiana/South%20Bend | sed -e 's/.$//' > south-bend.in [...] For some reason the gopher files are coming out of www with extra ^M's on the end, as if they were DOS files; so the sed thing gets rid of them. I don't see a way to do this with just one invocation of www, so instead it runs once for each file. Neither gopher nor WWW have the notion of a "recursive directory listing", either some complete overview of the structure of the system or some skeleton outline. (I realize it's arbitrarily hard to do so since any link could point off anywhere else.) That makes it tougher to do an archie-style catalog. I think it wouldn't be that hard to build a tree-walker for gopher that prints out a list of the directories on every system that it can find and also the text of all of the stuff that's in the ".about" directories. At the very least I'm doing some of that by hand now (just a script like the one above) & waising it so I have some clue what all is out there. *not* a replacement for the per-site indexes, but a cross-section. --Ed